Implement SparseK Attention mechanism — new GGML operator with CPU backend (GPU planned next) #16817

yael-works · 2025-10-28T13:16:40Z

New Attention Mechanism: SparseK Attention (CPU Backend)

This PR introduces a new attention mechanism called SparseK Attention, implemented from scratch as a new operator within the GGML framework, currently with CPU backend support.

Overview

SparseK Attention is a selective and efficient attention mechanism inspired by Flash Attention, but introduces additional sparsity through:

Top-K filtering – keeps only the strongest attention weights.
Local windowing – limits attention to a configurable local context.
Global stride – adds periodic global connections between tokens.

Implementation Details

Added new operator: GGML_OP_SPARSEK_ATTN defined in ggml.h and ggml.c.
Implemented construction function ggml_sparsek_attn() that creates a computation node with parameters (k_top, win_local, stride_global).
Added full CPU backend implementation in:
- ggml-cpu/ops.h
- ggml-cpu/ops.cpp
- ggml-cpu.c

The CPU version includes:

Scaled dot-product computation QKᵀ / √d
Dynamic Top-K filtering
Softmax normalization
Multiplication with V

Next Steps

Our next goal is to extend SparseK Attention to the SYCL (GPU) backend in order to:

Measure and compare performance between CPU and GPU implementations.
Optimize kernel execution for sparse attention patterns.
Validate correctness and scaling on Intel GPUs.

We are submitting this initial CPU implementation first to ensure review, integration, and baseline correctness before introducing GPU acceleration.

Co-Authors

Co-authored-by: Yael Shuker ([email protected])
Co-authored-by: Gitty Burstein ([email protected])

…or definition and tensor creation, backend implementation pending to ggml.c/h Co-authored-by: Yael Shuker <[email protected]> Co-authored-by: Gitty Burstein <[email protected]>

Co-authored-by: Yael Shuker <[email protected]> Co-authored-by: Gitty Burstein <[email protected]>

GittyBurstein · 2025-10-28T13:23:40Z

Hi @CISC and @NeoZhangJianyu,

We’d appreciate it if you could review our PR implementing the new SPARSEK Attention operator.
We ran internal validation tests we created ourselves, and all passed successfully.

This contribution was developed jointly by both of us (@yael-works and @GittyBurstein ).
Please make sure the PR reflects both contributors — if needed, we can adjust the commit authors accordingly.

Thanks in advance for your time and feedback!

CISC · 2025-10-28T13:35:43Z

We are talking about this SparseK, right?

yael-works · 2025-10-28T13:38:26Z

yes! @CISC

…gml.h Co-authored-by: Yael Shuker <[email protected]> Co-authored-by: Gitty Burstein <[email protected]>

Co-authored-by: Yael Shuker <[email protected]> Co-authored-by: Gitty Burstein <[email protected]>

Co-authored-by: Yael <[email protected]> Co-authored-by: Tamar <[email protected]>

CISC · 2025-10-30T10:52:36Z

You need to rebase to fix Server CI failures, also please fix whitespaces:
https://github.com/ggml-org/llama.cpp/actions/runs/18935125175/job/54060021809

Co-authored-by: Yael <[email protected]> Co-authored-by: Gitty <[email protected]>

…l <[email protected]> Co-authored-by: Gitty <[email protected]>

…-ops.cpp Co-authored-by: Gitty Burstein <[email protected]> Co-authored-by: Yael Shuker <[email protected]>

Co-authored-by: Gitty Burstein <[email protected]> Co-authored-by: Yael Shuker <[email protected]>

tests/test-backend-ops.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

GittyBurstein · 2025-10-31T11:07:29Z

Hi @CISC,
Just to clarify — the failing tests are unrelated to my changes.
This PR only introduces the new SPARSEK Attention operator within GGML and doesn’t modify any existing server or inference logic.

I’d really appreciate it if you could review the code itself so we can move forward with the merge —
all SPARSEK-related tests are passing successfully.

Thanks!

CISC · 2025-10-31T11:17:09Z

Hi @CISC, Just to clarify — the failing tests are unrelated to my changes. This PR only introduces the new SPARSEK Attention operator within GGML and doesn’t modify any existing server or inference logic.

Yes, as mentioned, will be resolved if you rebase, it's ok. :)

I’d really appreciate it if you could review the code itself so we can move forward with the merge — all SPARSEK-related tests are passing successfully.

So, my main challenge is where/what/when will SparseK be used? I can't recall seeing any actual implementation being used in the wild. This also means we don't really have any reference to test it against...

GittyBurstein · 2025-10-31T11:30:23Z

@CISC
The current PR focuses solely on adding the SparseK Attention operator at the GGML level (CPU backend).
At this stage, it isn’t directly integrated into the model’s runtime pipeline — it’s designed as a standalone operator for experimentation and future extensions.

Once this PR is merged, the operator can be connected to higher-level use cases such as:

selective attention mechanisms for long-context models,
experimental low-latency or memory-efficient inference,
or research benchmarking against variants like Flash Attention or block-sparse implementations....
Do you have any other idea that could demonstrate or validate this even better?

Thank you!!

CISC · 2025-10-31T11:34:14Z

I think @ggerganov will have to weigh in on this.

yael-works added 3 commits October 28, 2025 11:25

Add skeleton for GGML_OP_SPARSEK_ATTN (SparseK Attention): new operat…

66248d2

…or definition and tensor creation, backend implementation pending to ggml.c/h Co-authored-by: Yael Shuker <[email protected]> Co-authored-by: Gitty Burstein <[email protected]>

Add CPU support for SparseK Attention (without performance checks)

5d6d3b7

Co-authored-by: Yael Shuker <[email protected]> Co-authored-by: Gitty Burstein <[email protected]>

Merge branch 'master' into feature/sparsek-attn-sycl

46325c7

yael-works requested review from ggerganov and slaren as code owners October 28, 2025 13:16

github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning labels Oct 28, 2025

DajanaV mentioned this pull request Oct 28, 2025

UPSTREAM PR #16817: Implement SparseK Attention mechanism — new GGML operator with CPU backend (GPU planned next) auroralabs-loci/llama.cpp#4

Closed

yael-works and others added 6 commits October 29, 2025 12:47

fix: add missing prototypes for ggml_sparsek_attn_set/get_params in g…

a5daf2f

…gml.h Co-authored-by: Yael Shuker <[email protected]> Co-authored-by: Gitty Burstein <[email protected]>

fix SparseK CPU operator implementation

39a117f

Co-authored-by: Yael Shuker <[email protected]> Co-authored-by: Gitty Burstein <[email protected]>

fix SparseK CPU operator implementation

612fdca

Co-authored-by: Yael <[email protected]> Co-authored-by: Tamar <[email protected]>

trigger refresh

b0194f4

test commit from Gitty

d02d937

remove test file

5fa78a2

Gitty Burstein and others added 4 commits October 30, 2025 13:35

feat: implement SparseK attention core logic

b19c244

Co-authored-by: Yael <[email protected]> Co-authored-by: Gitty <[email protected]>

Implement final optimized SparseK Attention (CPU) Co-authored-by: Yae…

49c7e4b

…l <[email protected]> Co-authored-by: Gitty <[email protected]>

style: remove trailing whitespace and fix indentation in test-backend…

939bbd9

…-ops.cpp Co-authored-by: Gitty Burstein <[email protected]> Co-authored-by: Yael Shuker <[email protected]>

delete Trailing whitespace

1983ab3

Co-authored-by: Gitty Burstein <[email protected]> Co-authored-by: Yael Shuker <[email protected]>

CISC reviewed Oct 30, 2025

View reviewed changes

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

GittyBurstein and others added 3 commits October 31, 2025 01:56

Update tests/test-backend-ops.cpp

202c5d1

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update tests/test-backend-ops.cpp

9712967

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update tests/test-backend-ops.cpp

77f4088

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement SparseK Attention mechanism — new GGML operator with CPU backend (GPU planned next) #16817

Implement SparseK Attention mechanism — new GGML operator with CPU backend (GPU planned next) #16817

yael-works commented Oct 28, 2025

Uh oh!

GittyBurstein commented Oct 28, 2025 •

edited

Loading

Uh oh!

CISC commented Oct 28, 2025

Uh oh!

yael-works commented Oct 28, 2025 •

edited

Loading

Uh oh!

CISC commented Oct 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GittyBurstein commented Oct 31, 2025

Uh oh!

CISC commented Oct 31, 2025

Uh oh!

GittyBurstein commented Oct 31, 2025 •

edited

Loading

Uh oh!

CISC commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Implement SparseK Attention mechanism — new GGML operator with CPU backend (GPU planned next) #16817

Are you sure you want to change the base?

Implement SparseK Attention mechanism — new GGML operator with CPU backend (GPU planned next) #16817

Conversation

yael-works commented Oct 28, 2025

New Attention Mechanism: SparseK Attention (CPU Backend)

Overview

Implementation Details

Next Steps

Co-Authors

Uh oh!

GittyBurstein commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Oct 28, 2025

Uh oh!

yael-works commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Oct 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GittyBurstein commented Oct 31, 2025

Uh oh!

CISC commented Oct 31, 2025

Uh oh!

GittyBurstein commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GittyBurstein commented Oct 28, 2025 •

edited

Loading

yael-works commented Oct 28, 2025 •

edited

Loading

GittyBurstein commented Oct 31, 2025 •

edited

Loading